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Abstract. A new information-theoretic approach to the central limit theorem for 
stable laws is presented. The main novelty is the concept of relative fractional 
Fisher information, which shares most of the properties of the classical one, in¬ 
cluded Blachman-Stam type inequalities. These inequalities relate the fractional 
Fisher information of the sum of n independent random variables to the informa¬ 
tion contained in sums over subsets containing n — 1 of the random variables. As 
a consequence, a simple proof of the monotonicity of the relative fractional Fisher 
information in central limit theorems for stable law is obtained, together with an 
explicit decay rate. 

Keywords. Central limit theorem, Fractional calculus, Fisher information. Infor¬ 
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1 Introduction 

The entropy functional (or Shannon’s entropy) of a real valued random variable X with 
density / is defined as 

H{X) = H{f) = - f f (x) log f{x)dx. (1) 

Jm. 

provided that the integral makes sense. Among random variables with the same variance a 
the standard Gaussian Z with variance cr has the largest entropy. If the Xj's are indepen¬ 
dent copies of a centered random variable X with variance I, then the (classical) central 
limit theorem implies that the law of the normalized sums 

I " 

Sn = ^y^X, 

converges weakly to the law of the centered standard Gaussian Z, as n tends to infinity. 
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A direct consequence of the entropy power inequality, postulated by Shannon m in 
the fourthies, and subsequently proven by Stam |42) (cf. also Blachman [7]), implies that 
^{ 82 ) > H{Si). The entropy of the normalized sum of two independent copies of a random 
variable is larger than that of the original. A shorter proof was obtained later by Lieb m 
(cf. also [3l[28l[29] for exhaustive presentation of the subject). While inductively expected 
that the entire sequence H{Sn) should increase with n, as conjectured by Lieb in 1978 |31| . 
a rigorous proof of this result was found only 25 years later by Artstein, Ball, Barthe and 

A. Naor mm- 

More recently, simpler proofs of the monotonicity of the sequence H{Sn) have been ob¬ 
tained by Madiman and Barron |36l I37| and Tulino and Verdu jUj. Madiman and Barron 
[36]) by means of a detailed analysis of variance projection properties, derived new en¬ 
tropy power inequalities for sums of independent random variables, and, as a consequence, 
the monotonicity of entropy in central limit theorems for independent and identically dis¬ 
tributed random variables. Tulino and Verdu obtained analogous results by taking 
advantage of projection through minimum mean-squared error interpretation. As observed 
in m, the proofs of the main result in both [37104] share essential similarities. 

As suggested by Stam’s proof of the entropy power inequality [3 02], most of the 
results about monotonicity benefit from the reduction from entropy to another information- 
theoretic notion, the Fisher information of a random variable. For sufficiently regular 
densities, the Fisher information can be written as 

I{X) = I{f)=j (2) 

3 {/> 0 } fix) 

Among random variables with the same variance u, the Gaussian Z has smallest Fisher 
information l/a. Fisher information and entropy are related each other by the so-called de 
Bruijn relation mM- If u{x,t) = Ut{x) denotes the solution to the initial value problem 
for the heat equation in the whole space M, 

du d'^u 

leaving from an initial probability density function /(x), 

/(/) = 

A particularly clear explanation of this link is given in the article of Carlen and Soffer |13] 
(cf. also Barron [3] and Brown m)- 

It is noticeable that the connection between Fisher information and the central limit 
theorem was noticed at the time of Stam’s proof of entropy power inequality by Linnik 
|34j . who first used Fisher information in a proof of the central limit theorem. 

Recently, the role of Fisher information in limit theorems has been considered also 
in situations different from the classical central limit theorem. In m (cf- also mm) 
Bobkov, Chistyakov and Gotze enlightened the possibility to make use of the relative 
Fisher information to study convergence towards a stable law [181 [2(11130] . If the Xj's are 
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independent copies of a centered random variable X which lies in the domain of normal 
attraction of a random variable with Levy symmetric stable distribution, the central 
limit theorem for stable laws implies that the law of the normalized sums 


Tn = 






i=i 


(4) 


converges weakly to the law of the centered stable Zx, as n tends to infinity. Given a 
random variable X, with probability distribution G{x), the Fourier transform of G{x) will 
be denoted by 

FG{i) = G(0 = [ dG{x), G IR- 
Jr 

In case X possesses a probability density g{x) = G'{x), we will still denote the Fourier 
transform of g{x) by 

Fg{C)=m)= [ e~'^^9ix)dx, CgM. 

Jr 

A Levy symmetric stable distribution Lx{x) of order A is defined in terms of its Fourier 
transform by 

Lx{i) = e-\^\\ (5) 

While the Gaussian density is related to the linear diffusion equation ([3]) , Levy distributions 
are deeply related to linear fractional diffusion equations 

^ ( 6 ) 

For the classical diffusion case described by (|3|), a = 1 and the diffusion operator models a 
Brownian diffusion process. For fractional diffusion, where 1/2 < a < 1, the P 2 a operator 
in (j6]) is commonly referred to as anomalous diffusion, and the underlying stochastic process 
is a Levy stable flight. 

Indeed, the fractional diffusion equation (|6]) can be fruitfully described in terms of 
Fourier variables, where it takes the form 

^ = ( 7 ) 

Equation ([7]) can be easily solved. Its solution 

( 8 ) 


shows that the fractional diffusion equation admits a fundamental solution, given by a 
scaled in time Levy distribution of order A = 2 q;. 

Equation ([8]) enhances a strong analogy between the solution of the heat equation 
(|3|) and the solution to the fractional diffusion equation (jb]), and suggests that informa¬ 
tion techniques used for the former could be fruitfully used, by suitably adapting these 
techniques to the new situation, to the latter |I9) . 
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At difference with the analysis of Bobkov, Chistyakov and Gotze |10) . in this note we 
will show that this analogy can be reasonably stated at the level of Fisher information 
by resorting to a new definition, which in our opinion is better adapted to the anomalous 
diffusion case. In Section [2l we introduce a generalization of (relative) Fisher information, 
the relative fractional Fisher information, that is constructed to vanish on the Levy sym¬ 
metric stable distribution, and shares most of the properties of the classical one defined 
by ([2]). By using this new definition, we will present in Section [3] an information-theoretic 
proof of monotonicity of the fractional Fisher information of the normalized sums ([1|) by 
adapting the ideas developed in 13Z|. At difference with the classical case, it will be showed 
in Section [3] that the monotonicity also implies convergence in fractional Fisher information 
at a precise rate, which depends on the value of the exponent that characterizes the Levy 
distribution. 

Fractional in space diffusion equations appear in many contexts. Among others, the 
review paper by Klafter et al. |25| provides numerous references to physical phenomena 
in which these anomalous diffusions occur (cf. |6l [151 Ell ESI E] and the reference therein 
for various details on both mathematical and physical aspects). Also, fractional diffusion 
equations in the nonlinear setting have been intensively studied in the last years by Caf- 
farelli, Vazquez et al. da nmisiiMi. The reading of these papers was essential in moving 
my interest towards the present topic. 

2 Scores and fractional Fisher information 

In the rest of this paper, if not explicitly quoted, and without loss of generality, we will 
always assume that any random variable X we will consider is centered, i.e. E{X) = 0, 
where as usual E[-) denotes mathematical expectation. 

In this section we briefly summarize the mathematical notations and the meaning of the 
fractional diffusion operator T> 2 a which appears in equation ([6]). For 0 < a < 1 we let 
be the one-dimensional normalized Riesz potential operator defined for locally integrable 
functions by |39l |l3] 

The constant S{a) is chosen to have 

RMm = \^rm- ( 9 ) 




Since for 0 < a < 1 it holds [32] 

E\xr^ = 2 /-V2, 

where, as usual r(-) denotes the Gamma function, the value of S{a) is given by 

i/2r 


( 10 ) 


5(a) = 


TT 


V 2 
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Note that S{a) = ^(l — a). 

We then dehne the fractional derivative of order a of a real function / as (0 < a < 1) 


= d^afix) = —Ri-aif)ix) 


Thanks to ([9l), in Fourier variables 






(11) 


( 12 ) 


In theoretical statistics, the score or efficient score [I51E7] is the derivative, with respect 
to some parameter 9, of the logarithm of the likelihood function (the log-likelihood). If the 
observation is X and its likelihood is L{6-,X), then the score Pl{X) can be found through 
the chain rule 

1 dL{6-X) 


Pl{9,X) = 


(13) 


L{9-,X) de ' 

Thus the score indicates the sensitivity of L{0]X) (its derivative normalized by its value). 
In older literature, the term linear score refers to the score with respect to an infinitesimal 
translation of a given density. In this case, the likelihood of an observation is given by 
a density of the form L{9\X) = f[X 9). According to this definition, given a random 
variable X in M distributed with a differentiable probability density function /(x), its linear 
score p (at 0 = 0) is given by 

nx) 


p{X) = 


f{X) 


(14) 


The linear score has zero mean, and its variance is just the Fisher information ([2]) of X. 

Also, the notion of relative score has been recently considered in information theory 
|22j (cf. also |10|1. For every pair of random variables X and Y with differentiable density 
functions / (respectively g), the score function of the pair relative to X is represented by 


piX) = 


fjX) g'jX) 
f{X) g{X )' 


(15) 


In this case, the relative to X Fisher information between X and Y is just the variance of 
p{X). This notion is satisfying because it represents the variance of some error due to the 
mismatch between the prior distribution / supplied to the estimator and the actual distri¬ 
bution g. Obviously, whenever / and g are identical, then the relative Fisher information 
is equal to zero. 

Let Za{x) denote the Gaussian density in M with zero mean and variance a 


Za{x) = 


1 

, exp , 

'J2'na V 2(j 


(16) 


Then a Gaussian random variable of density Zfj is uniquely defined by the score function 


p{Za) = -Z^jo. 


5 









Also, the relative (to X) score function of X and Z„ takes the simple expression 


= ,171 

* l(X) ^ a' ^ " 

which induces a (relative to the Gaussian) Fisher information 

I(X) = i{f) = [ (^ + -) fix) dx. (18) 

A/> 0 } V fix) (x) 

Clearly, liX) > 0, while I{X) = 0 if A is a centered Gaussian variable of variance a. 

The concept of linear score can be naturally extended to cover fractional derivatives. 
Given a random variable A in M distributed with a probability density function fix) that 
has a well-defined fractional derivative of order a, with 0 < a < 1, its linear fractional 
score, denoted by Pa+i is given by 


= (19) 

Thus the linear fractional score indicates the non local (fractional) sensitivity of /(A -|- 0) 
at 0 = 0 (its fractional derivative normalized by its value). Differently from the classical 
case, the fractional score of A is linear in A if and only if A is a Levy distribution of order 
a -|- 1. Indeed, for a given positive constant C, the identity 

Pa+liX) = -CX, 

is verified if and only if, on the set {/ > 0} 

Vafix) = -Cxf{x). (20) 

Passing to Fourier transform, this identity yields 

mr^fii) = 

and from this follows 

/«) = /(0)exp|-J|^}. (21) 

Finally, by choosing C = (a-|-l)“^, and imposing that f{x) is a probability density function 
(i.e. by fixing /(^ = 0) = 1), we obtain that the Levy stable law of order a -|- 1 is the 
unique probability density solving (f20l) . 

It is important to remark that, unlike in the case of the linear score, the variance of 
the fractional score is in general unbounded. One can easily realize this by looking at the 
variance of the fractional score in the case of a Levy variable. For a Levy variable, in 
fact, the variance of the fractional score coincides with a multiple of its variance, which is 
unbounded [2U1ED]. For this reason, a consistent definition in this case is represented by 
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the relative fractional score. In reason of (f2T]) . a Levy random variable of density zx, with 
1 < A < 2 is uniquely defined by a linear fractional score function 

px{Zx) = -^, 

the relative (to X) fractional score function of X and Z\ assumes the simple expression 


Pa(X) 


Vx-if{X) X 

fix) ^ A’ 


( 22 ) 


which induces a (relative to the Levy) fractional Fisher information (in short A-Fisher 
relative information) 


hiX) = hif) = [ ( 

J{f>o} V 


Vx-ifjx) 

fix) 



2 

fix) dx. 


(23) 


The fractional Fisher information is always greater or equal than zero, and it is equal to 
zero if and only if X is a Levy symmetric stable distribution of order A. At difference with 
the relative standard relative Fisher information, Ix is well-defined any time that the the 
random variable X has a probability density function which is suitably closed to the Levy 
stable law (typically lies in a subset of the domain of attraction). We will define by Vx the 
set of probability density functions such that Ixif) < +oo, and we will say that a random 
variable X lies in the domain of attraction of the A-Fisher information if IxiX) < -|-oo. 
More in general, for a given positive constant v, we will consider other relative fractional 
score functions given by 


PxAx) 


Vx-ifjX) X 

fix) +Au' 


(24) 


This leads to the relative fractional Fisher information 


ixAX) = h,v 



Vx-ifjx) 

fix) 



2 

f{x)dx. 


(25) 


Clearly, Ix = Ix,i- Analogously, we will define by Vx,v the set of probability density 
functions such that Ix,vif) < +oo, and we will say that a random variable X lies in the 
domain of attraction if Ix^viX) < -boo. 


Remark 1 . The characterization of the functions which belong to the domain of attraction 
of the relative fractional Fisher information is not an obvious task. However, it can be 
seen that this set of functions is not empty. We will present an explicit example in the 
Appendix. 


3 Monotonicity of the fractional Fisher information 

We now proceed to recover some useful properties of the relative fractional score function 
pxiX) defined in ff22ll . Most of these properties are easy generalizations of analogous ones 
proven in m for the standard linear score function. The main difference here is that we 
need to resort to the relative one. The following Lemma holds 
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Lemma 1. Let Xi and X 2 be independent random variables with smooth densities, and 
let (respectively ) denote their fractional scores. Then, for each constant X, with 
I < X < 2, and each positive constant 6, with 0 < (5 < 1, the relative fractional score 
function of the sum Xi + X 2 can be expressed as 


px{x) = E 


dp^^liX,) + {l-5) + X2 = x' 


(26) 


Proof. Let fj, j = 1,2 and / be the densities of Xj, j = 1,2 and Xj + X 2 . Then, the 
density of Xj + X 2 is given by the convolution product of /i and / 2 , / = /i * / 2 - To start 
with, we remark that the fractional derivatives, as given by m and m, share the same 
behavior, with respect to convolutions, of the usual derivatives. Indeed, thanks to (I12p . it 
follows that 

T^x-if{x) = (T>a-i/i) * f 2 {x) = fi* (T>a-i/2) (x). 

The previous identity can be rewritten in terms of expectations as m 


Vx-J{x) =Vx-iE[h{x - X2)] 
=E[Vx-ih{x - X2)] 
=E[p[^\x-X2)fi{x-X2)]. 


Therefore 


Pxix) 


Vx-ifix) 


=E 


fix) 


=E 


fix) 

pW(Xi)|Xi+X2=X 


As usual, given the random variables X and Y, we denoted by the conditional 

expectation of X given Y. Exchanging the indexes, we obtain an identical expression which 
relates the fractional score of the sum to the second variable X 2 . Hence, for each positive 
constant 5, with 0 < 5 < 1 we have 


pxix) = 6 E [p^^\Xi)\Xi + X2 = xJ + (1 - 6 )E Y^\X2)\Xi+X2 = x 
To conclude the proof, it is enough to remark that the following identity holds true 


( 2 ), 


(27) 


x = 6 E 


Xi 


1^1 +^2 = 


+ il- 6 )E 


X 2 

1-5 


|Xi + X 2 = X 


□ 


Lemma [T] ha several interesting consequences. Since the norm of the relative fractional 
score is not less than that of its projection (i.e. by the Cauchy-Schwarz inequality), we 
obtain 


/ a(^i + X2) =E [pliXi + X2)] < E 


(5pg(Xi) + (l-5)ptL 


S^IxAXl) + (1 - d)Ax,l-5iX2). 



(28) 
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Inequality (1281) is the analogous of the Blachman-Stam inequality [3112], and allows to 
bound the relative fractional Fisher information of the sum of independent variables in 
terms of the relative fractional Fisher information of its addends. Inequality (I28p can be 
reduced to a normal form by making use of scaling arguments. Indeed, for any given 
random variable X such that one of the two sides is bounded, and positive constant v, the 
following identity holds 

(x) . (29) 

Using identity ([29|) into inequality (1281) we can write it in the form 

+ (1 - Sy/^X2) < (Xi) + (1 - (^ 2 ). (30) 

Note that equality into (l3Up holds if and only if both Xi and X 2 are Levy variables of 
order A. Indeed, equality into (|28p holds if and only if the relative score functions satisfy 

pW(x) = ci; yy\x) = C2. (31) 

In fact, if this is the case, the Cauchy-Schwarz inequality we used to obtain (1281) is satisfied 
with the equality sign. On the other hand, proceeding as in the derivation of (1211) . we 
conclude that (1311) implies that the densities of the Xj's, j = 1,2 have Fourier transforms 

.^•(0 = exp|-|C|^+ *0?| . (32) 

We proved 

Theorem 2. Let Xj, ji,2 be independent random variables such that their relative frac¬ 
tional Fisher information functions I\{Xj), j = 1,2 are bounded for some X, with 1 < A < 
2. Then, for each constant 6 with 0 < 5 < 1, Ix{5^^^Xi + (1 — 5 y^^X2) is bounded, and 
inequality M) holds. Moreover, there is equality in 130)) if and only if, up to translation, 
both Xj, j = 1,2 are Levy variables of exponent A. 

A posteriori, we can use the result of Theorem [2] to avoid inessential difficulties in 
proofs by means of a smoothing argument. Indeed, since we are interested in inequalities 
for convolutions of densities, and a Levy symmetric stable law is stable with respect to the 
operation of convolution, we can consider in the following text densities suitably smoothed 
by convolution with a Levy symmetric stable law. In fact, if Zi and Z 2 denote two 
independent copies of a symmetric Levy stable law Z of order A, for any given positive 
constants Cj, j = 1,2 the random variable e^^^Zi + ei^^Z 2 is symmetric Levy stable with 
the law of (ei + e 2 )^I^Z. Therefore, for any given positive constant e < 1, and random 
variable X with density function / we will denote by the density of the random variable 
Xf^ given by (1 — e)^Ax + e^l^Z , where the symmetric stable Levy variable Z of order A 
is independent of X. 

By virtue of Theorem [21 

lx[Xy<[\-efl^Lx[X). (33) 


9 


Hence, the relative fractional Fisher information of the smoothed version of the 
random variable is always smaller than the relative fractional Fisher information of X. 
Moreover, 

lim lA(X,) = h {X) . 

This statement follows from (I33p and Fatou’s lemma. Indeed, in view of (I33p . it only 
remains to show that liminfe^o > I\ (-^)- On {/ > 0}, (PA-i/e)^//e converges a.e. 

to (T’a-i/)^//. This is enough to imply by Fatou’s lemma the desired inequality. 

The next ingredient in the proof of monotonicity deals with the so-called variance drop 
inequality 133. The idea goes back at least to the pioneering work of Hoeffding on U 
statistics |23| . Let [n] denote the index set {1, 2,... , n}, and, for any s C [n], let Xs stand 
for the collection of random variables {Xi : i S s), with the indices taken in their natural 
increasing order. Then we have 

Theorem 3. Let the function $ : MX —>■ R, with 1 < m £ N, be symmetric in its 
arguments, and suppose that E [<I>(Xi, X 2 ,..., X^)] = 0. Define 

UiXuX2,...,X,,) = '^-^''~^- Y. (34) 

n\ 

{sC[n]:|s|=m} 

THgti 

E [C/2] < [^>2] . (35) 

In theoretical statistical, U defined in (I34p is called a C/-statistic of degree m with 
symmetric, mean zero kernel that is applied to data of sample size n. Thus, the essence 
of inequality (1351) is to give a quantitative estimate of the reduction of the variance of a 
[/-statistic when the sample size n increases. It is remarkable that, as soon as m > 1 the 
functions {Xg) are no longer independent. Nevertheless, the variance of the [/-statistic 
drops by a factor m/n. 

With the essential help of Theorem [3] we prove 

Theorem 4. Let denote the sum 0, where the random variables Xj are independent 
copies of a centered random variable X with bounded relative X-Fisher information, 1 < 
A < 2. Then, for each n > 1, the relative X-Fisher information of Tn is decreasing in n, 
and the following bound holds 


/n-l\(2-WA 

h {Tn) < {Tn-l) . (36) 

Proof. In what follows, for n > 1, = X)je[n] Tj will denote the (unnormalized) sum of 

the independent and identically distributed random variables Yj. Likewise, SX 

will denote the leave-one-out sum leaving out 1^. Since Sn = Sn'^ -p Yk, for each k £ [n] 
we can write 

X = E[Yk\Sn = x] + E[S)l^'>\Sn = x]. 
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On the other hand, since the Yj are independent and identically distributed, we have the 
identity 


E[Si^^\Sn = x] = {n- l)E[Yk\Sn = x]. 


which implies that we can write 


n 


X = 


n — 1 


E[Si^^\Sn = x] 


Hence, for the relative fractional score of Sn we conclude that, if Vn = {n — l)/n. 




for all /c G [n], and hence 


Px{Sn) = E 




fcG[n] 


Proceeding as in Lemma [H namely by using the fact that the norm of the fractional relative 
score is not less than that of its projection, we obtain 


h{Sn) = E [pliSn)] < E 

To this point. Theorem [3] yields 


^ fcG[n] / 


2n 


E 


2n 




fcG[n] 




fcG[n] 


n — 1 


n 




If we suppose that the right-hand side in the previous inequality is bounded, we obtained, 
for n > 1 the bound 

71 — 1 

USn) < - Ix,vSSn-l). (37) 


n 


To end up, let us choose Yk = Xkj'n}/^. In this case, Sn = Tn, where is the sum (j3|). 
Moreover 


5n-l = 


Xi Y X 2 + • • • + Xn—i f n — 1 


n 


l/A 


n 


l/A 


Tn-l — vU^Tn- 


n—1 • 


On the other hand, thanks to formula (jT]), 


Substituting into (I37p gives the result. 


□ 
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Remark 2. Surprisingly enough, at difference with the case of the standard central limit 
theorem, where A = 2 and the monotonicity result of the classical relative Fisher infor¬ 
mation reads I{Sn) < I{Sn-i), in the case of the central limit theorem for stable laws, 
the monotonicity of the relative A-Fisher information also gives a rate of decay. Indeed, 
formula (1361) of Theorem 0] shows that, for all n > 1 

X (2-A)/A 

h{Tn) < hiX), (38) 

namely convergence in relative A-Fisher information sense at rate 

Remark 3. This result allows to enlighten, at the level of Fisher information, a strong 
difference between the classical central limit theorem and the central limit theorem for 
stable laws. In the former case, the domain of attraction is very large and contains all 
random variables with finite variance, while the attraction in terms of relative Fisher 
information is, without additional assumptions, very low (only monotonicity is guaranteed). 
In the latter, the domain of attraction is very restricted and contains only random variables 
with distribution which has the same tails at infinity of the Levy stable law. However, in 
this case the attraction in terms of the relative fractional fisher information is very strong, 
and it is inversely proportional to the exponent A which characterizes the Levy stable law. 

4 The relative A-entropy 

The previous results show that theoretical information techniques can be fruitfully em¬ 
ployed to obtain a new approach to the central limit theorem for stable laws. In particular, 
the new concept of relative fractional Fisher information seems nicely adapted to the sub¬ 
ject, since most of the classical properties can be easily extended to this case. In particular, 
subadditivity of the classical Fisher information with respect to weighted convolution is 
shown to hold also for the relative fractional Fisher information. 

However, up to now, Fisher information has been considered as a useful instrument to 
obtain results for Shannon’s entropy functional defined in ([1]). The main finding in this 
context was in fact the proof of the monotonicity of Shannon’s entropy on the weighted 
sums in the central limit theorem. If stable laws are concerned, at difference with Fisher 
information, it seems difficult to find an explicit expression of the (non-local) corresponding 
of Shannon’s entropy, say the relative fractional entropy. 

One possible way to obtain a consistent definition would be to establish between the 
relative fractional entropy and the relative fractional Fisher information the same link 
which connects Shannon entropy to Fisher information through the solution to the heat 
equation ([3]). In addition to de Bruijn relation, further connections between entropy and 
Fisher information have been established by Barron |3] and Carlen and Soffer m- Given 
two random variables X and Y of densities f{x) and, respectively, g{x), the relative entropy 
H{X\Y) of X and Y is defined as 

H{X\Y) = H{f\g)= [ /(x)log^dx. 

Jr 9[x) 
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If X is a random variable with a density f{x) and arbitrary finite variance, and ft{x) = 
f{x,t) denotes the solution to the heat equation Q such that f{x,t = 0) = f{x) then it 

holds mm 

roo 

H{X\Z)= 

io 

where Z denotes as usual the Gaussian density of variance 1, and is the relative 

(to the Gaussian of variance 1 + t) Fisher information defined in (fT8|) . In analogous way, 
one can define the relative (to the stable law) relative entropy. Let X be a random variable 
with density /(x), and let ft{x) = f{x,t) denote the solution to the fractional diffusion 
equation ([6]) such that f{x,t = 0) = f{x). Then, it appears natural to define the fractional 
relative entropy of order A as 

poo 

Hx{X)= (39) 

JO 

where denotes the relative fractional Fisher information defined as in (|25l) . 

The relative fractional entropy given by (I39p is well-defined. In fact, if the random 
variable X has a density /(x), the solution to the fractional diffusion equation (j6]) has the 
density f{x,t) of the sum Xt = X + By inequality (I29D 


h,i+t{ft) = h,i+t{Xt) < (1 + (Xi(l + t)-i/^) . 

In addition, since 


Xi(i + t)-i/^ 



l/A 

x + 



1/A 




thanks to inequality (f33l) it holds 

h {xt{i +< h (X ). 

Finally, 

so that, integrating both sides from 0 to oo, we obtain 

HxiX)<j^h{X). 


(40) 


Analogously to the classical case, inequality (|40ll implies that the domain of attraction of 
the relative A-Fisher information is a subset of the domain of attraction of the relative 
A-entropy. 

Despite its complicated structure, as it happens in the classical situation, all inequalities 
satisfied by the relative fractional Fisher information also hold for the relative fractional 
entropy. By formula (|39p we can easily show that Theorem U] also is valid for the fractional 
relative entropy of order A. In the classical case, however, Csiszar-Kullback inequality 
HHEI] allows to pass from convergence in relative entropy to convergence in L^(M). It 
would be interesting to show that a similar result still holds in the case of the relative 
fractional entropy, but at present this remains an open question. 
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5 Conclusions 


Starting from the pioneering work of Linnik |lj4| . the role of Fisher information to obtain 
alternative proofs of the central limit theorem has been enlightened by a number of papers 
(cf. [H [21131 m [ini [T3l [28l [29l [36l HI] and the references therein). Only recently, Bobkov, 
Chistyakov and Gotze |10) used of the relative Fisher information to study convergence 
to stable laws. At difference with the previous existing literature, we studied the role of 
Fisher-like functionals in the central limit theorem for stable laws by resorting to the new 
concept of relative fractional Fisher information. This nonlocal functional relies on the 
consideration of a linear fractional score function. As the linear score function of a random 
variable X identifies Gaussian variables as the unique random variables for which the score 
is linear, Levy symmetric stable laws are here identified as the unique random variables for 
which the fractional linear score is linear. This analogy is pushed further to show that the 
relative fractional Fisher information, defined as the variance of the relative score, satisfies 
almost all properties of the classical relative Fisher information. While the fractional Fisher 
information represents in our opinion a powerful instrument to study convergence towards 
symmetric stable laws, the role of the analogous of Shannon’s entropy in this context at 
present remains obscure, and will deserve further investigations. 

6 Appendix 

To clarify that the domain of attraction of the fractional relative Fisher information con¬ 
stitute a notion that could be fruitfully used, we retain of paramount important to prove 
that this domain is not an empty subset of the classical domain of attraction of the stable 
law. To this aim, we will provide in this appendix an explicit example of a density which 
belongs both to the domain of attraction of the stable law, and to the domain of attraction 
of the relative fractional Fisher information. 

To start with, let us briefly recall some information about the domain of attraction of a 
stable law. More details can be found in the book |23| or, among others, in the papers jT], 
[5]. A centered distribution F belongs to the domain of normal attraction of the A-stable 
law ([5]) with distribution function L\{x) if and only if F satisfies |x|^F’(x) —>■ c as x —>■ —oo 
and x^{l — F{x)) ^ c as x —>■ -|-oo i.e. 

F{—x) = —^ -|- 5i(—x) and 1 — F{x) = -^ -|- 52(x) (x > 0) 

fI X (41) 

Si{x) = o(|x|“'^) as |x| —>■ -|-oo, i = 1,2 
where c = sin (^). 

If the distribution function F belongs to the domain of normal attraction of the A-stable 
law, for any v such that 0 < zz < A |24| 

f |x|‘"dF(x) < - 1 - 00 . (42) 

Jr 

The behavior of F in the physical space (1411) leads to a characterization of the domain of 
normal attraction of the A-stable law ([5]) in terms of Fourier transform. Indeed, if / is the 
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Fourier transform of the distribution function F satisfying (141 p . then 




( 43 ) 


where 

and \R{0\ = o{l) ^ ^ 0. 

Let us consider a probability density f{x) that belongs to the domain of attraction of Lx, 
with A > 1. Then, we have enough regularity to reckon the Fourier transform of 

Ta(x) = T>a/(x) + 

and, thanks to (1431) we obtain 


^a(x) = +^(0- 

It is known that the leading small ^-behavior of the singular component of the Fourier 
transform (|5Up will reflect an algebraic tail of decay of the distribution function (cf. for 
example Wong [47] L In our case, since contains the term 4'(x) should 

decay at infinity as . 

The leading example of a function which belongs to the domain of attraction of the 
A-stable law is the so-called Linnik distribution [331135) . expressed in Fourier variable by 

= TTW' 

For all 0 < A < 2, the function (|4H) is the characteristic function of a symmetric probability 
distribution. In addition, when X > 1, px £ L^(M), which, by applying the inversion 
formula, shows that px is a probability density function. 

The main properties of Linnik’s distributions can be extracted from its representation 
as a mixture (cf. Kotz and Ostrovskii |26|1. For any given pair of positive constants a and 
b, with 0 < o < 6 < 2 let g{s, a, b) denote the probability density 

/ 1) 7TQ,\ ^ 

g(s, a,b) = ( — sin — -- 7 ; ---—, 0 < s < 00 . 

^ b J 1 + s2« + 2s“cos^’ 


Then, the following equality holds [26] 


Pa 



Pb{Cls)g{s,a,b) ds, 


(45) 


or, equivalently 

POO 

Pa{x)= pb{sx)g{s,a,b)ds. 

Jo 

This representation allows us to generate Linnik distributions of different parameters start¬ 
ing from a convenient base, typically from the Laplace distribution (corresponding to 
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6 = 2). In this case, since P2(0 = 1/(1 + I^P) (alternatively P 2 {x) = e 1*1/2 in the 
physical space), for any A with 1 < A < 2 we obtain the explicit representation 

foo 2 

P^(^)= 2 I 1,^12 9{s,X,2)ds, (46) 

2o 'S + Isl 

or, in the physical space 

fca 

= 1 ^e-^^^^g{s,\,2)ds. (47) 

Owing to dUD we obtain easily that, for 1 < A < 2, Linnik’s probability density is a 
symmetric and bounded function, non-increasing and convex for x > 0. Moreover, since 
Linnik’s distribution belongs to the domain of attraction of the stable law of order A, p\{x) 
decays to zero like as |x| —)• oo. These properties insure that there exist positive 

constants A\ and Bx such that 

PxHx)<Ax + Bx\x\^+\ (48) 


In reason of (I48p . we obtain the bound 

f (Vx-i + jPxix)) p^^{x)dx< [ gl{x){Ax +Bx\x\^^^)dx. 

t/ M ^ t/ M 


(49) 


In (|49p we defined 


Explicit computations give 


gx{x) = T>x-ipx{x) jpx{x). 


?a(0 


(l + ICI"f 


(50) 


As discussed before, the algebraic tail of decay of a function is given by the leading small 
behavior of the singular component of its Fourier transform ([50|). Hence, while the leading 
singular component in the Linnik distribution (14411 is |^|^, which induces condition (|42|), by 
(j50D the leading singular component in g^ is which would imply that the right-hand 

side of (14911 is bounded, in reason of the fact that 4A — 2 > A -|- 1 for A > 1. This suggests 
that the relative fractional Fisher information of Linnik’s distribution is bounded. 

An explicit proof of this property follows owing to representation (I45p . To this extent, 
consider that the function g\{C} defined in (fbUll can be also written in the form 


?a(0 = 


.|?l^rfPA(0 


A d^ 


Therefore, differentiating under the integral sign in (14611 we obtain the equivalent expression 


I 


?a(0 = / hxiC/s)s^ ^ g{s,X,2)ds, 


(51) 
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where 


(52) 


hxiO = 


Ciei 


A 


A(1 + |CP)2- 

A// u„l-- r2/ 


Since 1 < A < 2, it follows that both h\ and h'^ belong to L^(M). Hence, by Plancherel’s 
identity we obtain 


f \x\^h\{x)dx= [ \x^h\{x)\^ dx = ^ f <+oo. 

Jr Jr 27r Jr 

Thus, for any given R > 0 

[ \x\^~^^h\{x) dx < [ \x\^~^^h\{x) dx + ' [ \x\'^h\{x) dx < 

Jr Jux\<r} ^ Jux\>R} 


(53) 


1 


< (2i?)^+^ f h\{x)dx + —J^ I \x\‘^h\{x) dx. 

Jr ^ Jr 


Optimizing over R we obtain 


f \x\^^^h‘l{x) dx < Cx [ f h\{x)dx^ 

Jr \Jr J 


(3-A)/4 / !- X (l+^)/4 

\x\^h\{x) dx 


where Cx is an explicitly computable constant. Note that, in view of (|53p . the right-hand 
side of the previous inequality is bounded. Finally, by Jensen’s inequality we have 


dx < 


c c \ 

/ \x\^'^^g\{x) dx = / 1x1^“'"'’’ / shx{sx) g{s,\,2) ds 

■/r -/R Jo 

[ [ (hx{sx)s^^ g{s,X,2)ds dx = 

Jr Uo ^ ' 

r poo 

/ \x\^^^h‘l{x) dx / g{s, X,2) ds. 

Jr Jo 

By definition, the probability density g{s,X,2) £ L^(M) f) L°°(M). This implies that, for 
1 < A < 2 

poo 

/ 5 '(s, A, 2) ds <-boo. 

Jo 

Hence the relative fractional Fisher information of the Linnik’s density is bounded. 
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